{epiprocess} & {epipredict}

R packages to ramp up forecasting systems


Daniel J. McDonald, Ryan J. Tibshirani, Logan C. Brooks

and CMU’s Delphi Group

Stanford STATS/BIODS 352 — 12 April 2023

Background

  • Covid-19 Pandemic required quickly implementing forecasting systems.

  • Basic processing—outlier detection, reporting issues, geographic granularity—implemented in parallel / error prone

  • Data revisions complicate evaluation

  • Simple models often outperformed complicated ones

  • Custom software not easily adapted / improved by other groups

  • Hard for public health actors to borrow / customize community techniques

{epiprocess}

Basic processing operations and data structures

  • Calculate rolling statistics
  • Fill / impute gaps
  • Examine correlations
  • Store revision history smartly
  • Inspect revision patterns
  • Find / correct outliers

Revision patterns

Outlier handling

bc <- bc %>% mutate(outliers = detect_outlr_rm(time_value, cases)) 
ny <- ny %>% mutate(outliers = detect_outlr_stl(time_value, cases))

Outlier handling

bc <- bc %>% mutate(outliers = detect_outlr_rm(time_value, cases)) 
ny <- ny %>% mutate(outliers = detect_outlr_stl(time_value, cases))

{epipredict}

A forecasting framework

  • Flatline forecaster
  • AR-type models
  • Backtest using the versioned data
  • Easily create features
  • Quickly pivot to new tasks
  • Highly customizable for advanced users

{epipredict}

Canned forecasters that work out of the box.

You can do a limited amount of customization.

We currently provide:

  • Baseline flat-line forecaster
  • Autoregressive forecaster (not an “AR” model, you don’t want this)
  • Autoregressive classifier

Basic autoregressive forecaster

  • Predict death_rate, 1 week ahead, with 0,7,14 day lags of cases and deaths.
  • Use lm for estimation. Also create “intervals”.
library(epipredict)
jhu <- case_death_rate_subset # grab some built-in data
canned <- arx_forecaster(
  epi_data = jhu, 
  outcome = "death_rate", 
  predictors = c("case_rate", "death_rate")
)

The output is basically ready to submit to COVID-19 ForecastHub

Adjust lots of built-in options

rf <- arx_forecaster(
  epi_data = jhu, 
  outcome = "death_rate", 
  predictors = c("case_rate", "death_rate", "fb-survey"),
  trainer = parsnip::rand_forest(mode = "regression"), # use ranger
  args_list = arx_args_list(
    ahead = 14, # 2-week horizon
    lags = list(c(0:4, 7, 14), c(0, 7, 14), c(0:7, 14)), # bunch of lags
    levels = c(0.01, 0.025, 1:19/20, 0.975, 0.99), # 23 ForecastHub quantiles
    quantile_by_key = "geo_value" # vary q-forecasts by location
  )
)

{epipredict}

+ Framework for customizing from modular components.

  1. Preprocessor: do things to the data before model training
  2. Trainer: train a model on data, resulting in an object
  3. Predictor: make predictions, using a fitted model object
  4. Postprocessor: do things to the predictions before returning

A very specialized plug-in to {tidymodels}

Do (almost) anything manually

# A preprocessing "recipe" that turns raw data into features / response
r <- epi_recipe(jhu) %>%
  step_epi_lag(case_rate, lag = c(0, 1, 2, 3, 7, 14)) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 14) %>%
  step_epi_naomit()

# A postprocessing routine describing what to do to the predictions
f <- frosting() %>%
  layer_predict() %>%
  layer_threshold(.pred, lower = 0) %>% # predictions/intervals should be non-negative
  layer_add_target_date(target_date = max(jhu$time_value) + 14) %>%
  layer_add_forecast_date(forecast_date = max(jhu$time_value))

# Bundle up the preprocessor, training engine, and postprocessor
# We use quantile regression
ewf <- epi_workflow(r, quantile_reg(tau = c(.1, .5, .9)), f)

# Fit it to data (we could fit this to ANY data that has the same format)
trained_ewf <- ewf %>% fit(jhu)

# examines the recipe to determine what we need to make the prediction
latest <- get_test_data(r, jhu)

# we could make predictions using the same model on ANY test data
preds <- trained_ewf %>% predict(new_data = latest)

Packages are under active development

Thanks:

  • The whole CMU Delphi Team (across many institutions)
  • Optum/UnitedHealthcare, Change Healthcare.
  • Google, Facebook, Amazon Web Services.
  • Quidel, SafeGraph, Qualtrics.
  • Centers for Disease Control and Prevention.
  • Council of State and Territorial Epidemiologists